Figure 1: Chromosome 11. The location of OR10G4 is indicated by a red rectangle

Figure 1: Chromosome 11. The location of OR10G4 is indicated by a red rectangle

Introduction

This project is going to be a preliminary exploration of population genetics in R. I’m going to try and take genomic data from 1KG and replicate some of the analyses that I learned to do last year with Lisa in Perl. The gene that I will be using as my guinea pig is the gene OR10G4, which was found by Mainland et al.1 to have SNPs that are implicated in differences in olfactory phenotypes in humans. The goal of this study is to assess whether the SNPs in this gene shows signatures of selection acting on them. This would implicate that human olfaction is subject to environmental pressures, which is not something that has been properly looked at. So, let’s see if we can’t get some of this stuff to work.

Background

The largest gene family in humans is the OR gene family, which encodes our roughly 400 intact odor receptors (and around 800 receptors total). These odorant receptors have been shown to have high genetic variability among individuals, which affects odor perception2. SNPs in many of these genes have been shown to affect the olfaction phenotypes of individuals, in some cases making odorants smell like completely different scents to different people. One large example of this is androstenone, which is an “odorous steroid derived from testosterone”. On an individual level, the steroid’s odor perception can vary from smelling “bad,” to smelling “good,” to being completely odorous3. Finally, it has been shown by Mainland et al. that individuals from 1KG populations share functional differences across 27 various odor receptors (see figure below)4.
Figure 2: Figure 5 from Mainland et al., with description


The gene OR10G4 was the subject of testing by Mainland et al. for looking at participants’ perception of three odorants: guaiacol, vanillin, and ethyl vanillin. In particular, four alleles were chosen with MAF>4% to judge whether they aided in the perception (particularly, the intensity of which they were smelled) of these odorants. It was found that 15% of the variation in guaiacol scent was caused by the genotypes in OR10G4, and none of the variation in intensity of vanillin or ethyl vanillin was explained by these alleles5. Because guaiacol is a part of many natural flavors, I decided to look at OR10G4 in this study, because logically, it is possible for at least one of these genes (hopefully OR10G4) that contains odorant perception-altering SNPs to have different genotypes more prevalent in

Methodology/Results

Ok so I’m gonna start out by getting some data in to R. I’ve used the Ensembl Data Slicer 6 to get a VCF file of all of the SNPs in the gene OR10G4 (located on chr11:123,886,282-123,887,217 in the cytogenetic band 11q24.2). I’m going to use the package vcfR for the initial read-in of the data.

library(vcfR)
## 
##    *****       ***   vcfR   ***       *****
##    This is vcfR 1.5.0 
##      browseVignettes('vcfR') # Documentation
##      citation('vcfR') # Citation
##    *****       *****      *****       *****
OR10G4 <- read.vcfR("OR10G4.vcf.gz", verbose = TRUE)
## 
Meta line 252 read in.
## All meta lines processed.
## Character matrix gt created.
## Character matrix gt rows: 73
## Character matrix gt cols: 2513
## skip: 0
## nrows: 73
## row_num: 0
## 
## 
Processed variant: 73
## All variants processed

I originally wanted to try creating a detailed DNAbin object here, like, with additional information besides the original vcfR object. I want to eventually learn how to attach a fasta file as a reference sequence to a DNAbin made from a vcfR object, but that will not happen with this project. Instead, you can technically make a DNAbin just from a vcfR object, so that’s what I’m going to do for now. If there’s errors in my analyses because my DNAbin lacks additional info, I apologize. So, here’s my DNAbin object:

myDNA <- vcfR2DNAbin(OR10G4)
## After extracting indels, 72 variants remain.
## 
Variant 3 processed

Ok, now that that is made, I’m going to do some experimental population genetic analyses using VCF file I have now, which contains all of the 1KG populations. Later, I will choose a couple to look at individually, but these analyses will be used as a) a place to test all of my analyses to make sure they work, and b) as something I can compare everything back to at the end. The tests I will be performing are: Hardy-Weinberg, Tajima’s D (this is the analysis that uses the DNAbin object, so I hope this works…), Heterozygosity, and LD.

library(pegas)
## Loading required package: ape
## Warning: package 'ape' was built under R version 3.4.2
## Loading required package: adegenet
## Warning: package 'adegenet' was built under R version 3.4.2
## Loading required package: ade4
## 
##    /// adegenet 2.1.0 is loaded ////////////
## 
##    > overview: '?adegenet'
##    > tutorials/doc/questions: 'adegenetWeb()' 
##    > bug reports/feature requests: adegenetIssues()
## 
## Attaching package: 'pegas'
## The following object is masked from 'package:ade4':
## 
##     amova
## The following object is masked from 'package:ape':
## 
##     mst
## The following object is masked from 'package:vcfR':
## 
##     getINFO
#hardy-weinberg test
pegas.data <- vcfR2genind(OR10G4, sep = "[|/]")
hw.test(pegas.data, B = 0)
##                    chi^2 df Pr(chi^2 >)
## rs547870    9.988014e-05  1 0.992026070
## rs138655388 4.905877e-03  1 0.944160255
## rs79057843  3.905388e+00  1 0.048131512
## rs11219407  5.209420e+00  1 0.022464833
## rs539322343 9.988014e-05  1 0.992026070
## rs557336616 8.996398e-04  1 0.976071841
## rs182148733 9.988014e-05  1 0.992026070
## rs536792742 9.988014e-05  1 0.992026070
## rs201867883 9.988014e-05  1 0.992026070
## rs574962070 9.988014e-05  1 0.992026070
## rs542394318 9.988014e-05  1 0.992026070
## rs142498161 9.988014e-05  1 0.992026070
## rs572702527 9.988014e-05  1 0.992026070
## rs145971999 1.599999e-03  1 0.968093136
## rs138504511 3.996802e-04  1 0.984049752
## rs12422129  5.625711e-01  1 0.453226162
## rs543960796 9.988014e-05  1 0.992026070
## rs146142166 3.996802e-04  1 0.984049752
## rs529332605 9.988014e-05  1 0.992026070
## rs200481059 2.798321e+00  1 0.094363074
## rs140019860 1.599999e-03  1 0.968093136
## rs533269341 9.988014e-05  1 0.992026070
## rs551254667 9.988014e-05  1 0.992026070
## rs113201955 5.806317e-02  1 0.809583790
## rs536630067 9.988014e-05  1 0.992026070
## rs199564594 2.500998e-03  1 0.960114437
## rs148563076 1.599999e-03  1 0.968093136
## rs397832341 3.996802e-04  1 0.984049752
## rs554354023 9.988014e-05  1 0.992026070
## rs572538067 9.988014e-05  1 0.992026070
## rs147538417 1.696094e-02  1 0.896381143
## rs558465945 8.996398e-04  1 0.976071841
## rs144654389 9.988014e-05  1 0.992026070
## rs503223    1.435609e+00  1 0.230851150
## rs561904511 3.996802e-04  1 0.984049752
## rs574079030 9.988014e-05  1 0.992026070
## rs541128230 9.988014e-05  1 0.992026070
## rs186225765 9.988014e-05  1 0.992026070
## rs546568617 4.440115e-02  1 0.833109050
## rs3017763   2.259923e-02  1 0.880503938
## rs3017764   2.259923e-02  1 0.880503938
## rs150347384 4.905877e-03  1 0.944160255
## rs1893764   1.306851e+00  1 0.252965636
## rs548590301 9.988014e-05  1 0.992026070
## rs567111591 9.988014e-05  1 0.992026070
## rs4084209   4.752980e+00  1 0.029247598
## rs201667584 9.988014e-05  1 0.992026070
## rs144455396 1.599999e-03  1 0.968093136
## rs201229870 9.988014e-05  1 0.992026070
## rs148375252 9.988014e-05  1 0.992026070
## rs141555350 9.988014e-05  1 0.992026070
## rs11219408  4.440115e-02  1 0.833109050
## rs191928020 9.988014e-05  1 0.992026070
## rs116021926 1.623271e-01  1 0.687022952
## rs144615790 1.599999e-03  1 0.968093136
## rs559573395 9.988014e-05  1 0.992026070
## rs577451354 9.988014e-05  1 0.992026070
## rs4936880   1.455091e+00  1 0.227712807
## rs563115832 9.988014e-05  1 0.992026070
## rs200511194 9.988014e-05  1 0.992026070
## rs548855939 3.996802e-04  1 0.984049752
## rs144812642 9.988014e-05  1 0.992026070
## rs146687807 8.996398e-04  1 0.976071841
## rs112431396 3.996802e-04  1 0.984049752
## rs201358968 3.996802e-04  1 0.984049752
## rs150326783 1.967856e-02  3 0.999270129
## rs370575449 9.988014e-05  1 0.992026070
## rs4936881   1.290904e+00  1 0.255881506
## rs61908597  9.146139e+00  1 0.002492434
## rs574009321 3.996802e-04  1 0.984049752
## rs4936882   5.180774e+00  1 0.022838150
## rs148404016 2.650555e-01  1 0.606668031
## rs202048997 9.988014e-05  1 0.992026070
#Tajima's D test
tajima.test(myDNA)
## $D
## [1] -1.397705
## 
## $Pval.normal
## [1] 0.1622016
## 
## $Pval.beta
## [1] 0.1272968
#Heterozygosity
het1 <- summary(pegas.data)
het1
## 
## // Number of individuals: 2504
## // Group sizes: 2504
## // Number of alleles per locus: 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2
## // Number of alleles per group: 147
## // Percentage of missing data: 0 %
## // Observed heterozygosity: 0 0 0.41 0.43 0 0 0 0 0 0 0 0 0 0 0 0.16 0 0 0 0.06 0 0 0 0.01 0 0 0 0 0 0 0.01 0 0 0.24 0 0 0 0 0.01 0.01 0.01 0 0.14 0 0 0.43 0 0 0 0 0 0.01 0 0.02 0 0 0 0.48 0 0 0 0 0 0 0 0.01 0 0.48 0.07 0 0.43 0.02 0
## // Expected heterozygosity: 0 0 0.43 0.45 0 0 0 0 0 0 0 0 0 0 0 0.16 0 0 0 0.06 0 0 0 0.01 0 0 0 0 0 0 0.01 0 0 0.23 0 0 0 0 0.01 0.01 0.01 0 0.15 0 0 0.45 0 0 0 0 0 0.01 0 0.02 0 0 0 0.49 0 0 0 0 0 0 0 0.01 0 0.49 0.08 0 0.45 0.02 0
names(het1)
## [1] "n"         "n.by.pop"  "loc.n.all" "pop.n.all" "NA.perc"   "Hobs"     
## [7] "Hexp"
plot(het1$Hobs, xlab="Loci number", ylab="Observed Heterozygosity", 
     main="Observed heterozygosity per locus")

plot(het1$Hobs, het1$Hexp, xlab="Hobs", ylab="Hexp", 
     main="Expected heterozygosity as a function of observed heterozygosity per locus")

#LD (this one may get tricky)

library(pegas)

loci <- genind2loci(pegas.data)

I am splitting up this chunk to talk more about my LD analysis. So in order to check out LD in the way that I would like to by using the LDscan function, we need to know whether our data is “phased,” which, I don’t know what that means but it is apparently easier to work with phased data than unphased. Because there’s apparently no easy way to “phase” the data from the file type I converted from, we may be shit out of luck on that front. That being said, I have an idea of how I want to do LD if our data is unphased.

is.phased(loci)

(I split that function up from the rest of the LD analysis so I could hide the results because they were very long) Welp, time to go with plan B, which is use the LD2 function. As I learned the hard way (lol), you specify the locus by which number SNP it is in the gene, and not by the SNP ID number. Since there are 73 SNPs in this gene, I’ll just first look at whether there is LD between the first and last SNP. After that, I may just play around with other SNP numbers.

LD2(loci, locus = c(1, 73), details = TRUE)
## $Delta
##               0             1
## 0 -7.974461e-08  7.974461e-08
## 1  7.974461e-08 -7.974461e-08
## 
## $T2
##           T2           df        P-val 
## 0.0003996802 1.0000000000 0.9840497518
#Well that's shit. Let's try some more. 

LD2(loci, locus = c(1, 30), details = TRUE)
## $Delta
##               0             1
## 0 -7.974461e-08  7.974461e-08
## 1  7.974461e-08 -7.974461e-08
## 
## $T2
##           T2           df        P-val 
## 0.0003996802 1.0000000000 0.9840497518
LD2(loci, locus = c(10, 73), details = TRUE)
## $Delta
##               0             1
## 0 -7.974461e-08  7.974461e-08
## 1  7.974461e-08 -7.974461e-08
## 
## $T2
##           T2           df        P-val 
## 0.0003996802 1.0000000000 0.9840497518
LD2(loci, locus = c(10, 30), details = TRUE)
## $Delta
##               0             1
## 0 -7.974461e-08  7.974461e-08
## 1  7.974461e-08 -7.974461e-08
## 
## $T2
##           T2           df        P-val 
## 0.0003996802 1.0000000000 0.9840497518

So here is what I learned from doing this general look:

For the rest of this study, I’m going to focus on one population from each broader region (Europe, Africa, Americas, South Asia, East Asia). I can separate these populations out with the Data Slicer, so I’m going to read in some population data separately. The populations I’m going with are: JPT (Japanese in Tokyo, Japan), FIN (Finnish in Finland), LWK (Luhya in Webuye, Kenya), PUR (Puerto Ricans from Puerto Rico), and PJL (Punjabi from Lahore, Pakistan). When I have all of those population separated out I will begin testing each of them for signatures of selection.

#importing data for just the populations specified above

JPT <- read.vcfR("JPT.vcf.gz", verbose = TRUE)
## 
Meta line 253 read in.
## All meta lines processed.
## Character matrix gt created.
## Character matrix gt rows: 73
## Character matrix gt cols: 113
## skip: 0
## nrows: 73
## row_num: 0
## 
## 
Processed variant: 73
## All variants processed
JPT
## ***** Object of Class vcfR *****
## 104 samples
## 1 CHROMs
## 73 variants
## Object size: 0.1 Mb
## 0 percent missing data
## *****        *****         *****
FIN <- read.vcfR("FIN.vcf.gz", verbose = TRUE)
## 
Meta line 253 read in.
## All meta lines processed.
## Character matrix gt created.
## Character matrix gt rows: 73
## Character matrix gt cols: 108
## skip: 0
## nrows: 73
## row_num: 0
## 
## 
Processed variant: 73
## All variants processed
FIN
## ***** Object of Class vcfR *****
## 99 samples
## 1 CHROMs
## 73 variants
## Object size: 0.1 Mb
## 0 percent missing data
## *****        *****         *****
LWK <- read.vcfR("LWK.vcf.gz", verbose = TRUE)
## 
Meta line 253 read in.
## All meta lines processed.
## Character matrix gt created.
## Character matrix gt rows: 73
## Character matrix gt cols: 108
## skip: 0
## nrows: 73
## row_num: 0
## 
## 
Processed variant: 73
## All variants processed
LWK
## ***** Object of Class vcfR *****
## 99 samples
## 1 CHROMs
## 73 variants
## Object size: 0.1 Mb
## 0 percent missing data
## *****        *****         *****
PUR <- read.vcfR("PUR.vcf.gz", verbose = TRUE)
## 
Meta line 253 read in.
## All meta lines processed.
## Character matrix gt created.
## Character matrix gt rows: 73
## Character matrix gt cols: 113
## skip: 0
## nrows: 73
## row_num: 0
## 
## 
Processed variant: 73
## All variants processed
PUR
## ***** Object of Class vcfR *****
## 104 samples
## 1 CHROMs
## 73 variants
## Object size: 0.1 Mb
## 0 percent missing data
## *****        *****         *****
PJL <- read.vcfR("PJL.vcf.gz", verbose = TRUE)
## 
Meta line 253 read in.
## All meta lines processed.
## Character matrix gt created.
## Character matrix gt rows: 73
## Character matrix gt cols: 105
## skip: 0
## nrows: 73
## row_num: 0
## 
## 
Processed variant: 73
## All variants processed
PJL
## ***** Object of Class vcfR *****
## 96 samples
## 1 CHROMs
## 73 variants
## Object size: 0.1 Mb
## 0 percent missing data
## *****        *****         *****

Now that we have our separate population data, we can repeat our analyses with it. I will start with the JPT data.

#JPT

library(pegas)

#hardy-weinberg test
pegas.JPT <- vcfR2genind(JPT, sep = "[|/]")
hw.test(pegas.JPT, B = 0)
##                   chi^2 df Pr(chi^2 >)
## rs547870    0.000000000  0  1.00000000
## rs138655388 0.000000000  0  1.00000000
## rs79057843  0.512073492  1  0.47424263
## rs11219407  0.068376068  1  0.79371606
## rs539322343 0.000000000  0  1.00000000
## rs557336616 0.000000000  0  1.00000000
## rs182148733 0.000000000  0  1.00000000
## rs536792742 0.000000000  0  1.00000000
## rs201867883 0.000000000  0  1.00000000
## rs574962070 0.000000000  0  1.00000000
## rs542394318 0.000000000  0  1.00000000
## rs142498161 0.000000000  0  1.00000000
## rs572702527 0.000000000  0  1.00000000
## rs145971999 0.000000000  0  1.00000000
## rs138504511 0.000000000  0  1.00000000
## rs12422129  2.314215073  1  0.12819600
## rs543960796 0.000000000  0  1.00000000
## rs146142166 0.000000000  0  1.00000000
## rs529332605 0.000000000  0  1.00000000
## rs200481059 0.265279053  1  0.60651635
## rs140019860 0.039984621  1  0.84151065
## rs533269341 0.000000000  0  1.00000000
## rs551254667 0.002427128  1  0.96070740
## rs113201955 0.000000000  0  1.00000000
## rs536630067 0.000000000  0  1.00000000
## rs199564594 0.000000000  0  1.00000000
## rs148563076 0.000000000  0  1.00000000
## rs397832341 0.000000000  0  1.00000000
## rs554354023 0.002427128  1  0.96070740
## rs572538067 0.000000000  0  1.00000000
## rs147538417 0.000000000  0  1.00000000
## rs558465945 0.000000000  0  1.00000000
## rs144654389 0.000000000  0  1.00000000
## rs503223    2.954172453  1  0.08565615
## rs561904511 0.000000000  0  1.00000000
## rs574079030 0.000000000  0  1.00000000
## rs541128230 0.000000000  0  1.00000000
## rs186225765 0.000000000  0  1.00000000
## rs546568617 0.000000000  0  1.00000000
## rs3017763   0.000000000  0  1.00000000
## rs3017764   0.000000000  0  1.00000000
## rs150347384 0.000000000  0  1.00000000
## rs1893764   2.314215073  1  0.12819600
## rs548590301 0.000000000  0  1.00000000
## rs567111591 0.000000000  0  1.00000000
## rs4084209   0.156896935  1  0.69202965
## rs201667584 0.000000000  0  1.00000000
## rs144455396 0.000000000  0  1.00000000
## rs201229870 0.000000000  0  1.00000000
## rs148375252 0.000000000  0  1.00000000
## rs141555350 0.000000000  0  1.00000000
## rs11219408  0.000000000  0  1.00000000
## rs191928020 0.000000000  0  1.00000000
## rs116021926 0.000000000  0  1.00000000
## rs144615790 0.000000000  0  1.00000000
## rs559573395 0.000000000  0  1.00000000
## rs577451354 0.000000000  0  1.00000000
## rs4936880   0.035423906  1  0.85071016
## rs563115832 0.000000000  0  1.00000000
## rs200511194 0.000000000  0  1.00000000
## rs548855939 0.000000000  0  1.00000000
## rs144812642 0.000000000  0  1.00000000
## rs146687807 0.000000000  0  1.00000000
## rs112431396 0.000000000  0  1.00000000
## rs201358968 0.000000000  0  1.00000000
## rs150326783 0.000000000  0  1.00000000
## rs370575449 0.000000000  0  1.00000000
## rs4936881   0.035423906  1  0.85071016
## rs61908597  0.000000000  0  1.00000000
## rs574009321 0.000000000  0  1.00000000
## rs4936882   0.068376068  1  0.79371606
## rs148404016 0.000000000  0  1.00000000
## rs202048997 0.000000000  0  1.00000000
#All of the alleles in this population are in HWE

#Tajima's D test

JPTdna <- vcfR2DNAbin(JPT)
## After extracting indels, 72 variants remain.
## 
Variant 3 processed
tajima.test(JPTdna)
## $D
## [1] 1.331978
## 
## $Pval.normal
## [1] 0.1828673
## 
## $Pval.beta
## [1] 0.2023292
#D-value is 1.33, but P-value is not significant, so we cannot interpret this as selection. 

#Heterozygosity
het2 <- summary(pegas.JPT)
het2
## 
## // Number of individuals: 104
## // Group sizes: 104
## // Number of alleles per locus: 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1
## // Number of alleles per group: 86
## // Percentage of missing data: 0 %
## // Observed heterozygosity: 0 0 0.45 0.37 0 0 0 0 0 0 0 0 0 0 0 0.26 0 0 0 0.1 0.04 0 0.01 0 0 0 0 0 0.01 0 0 0 0 0.29 0 0 0 0 0 0 0 0 0.26 0 0 0.36 0 0 0 0 0 0 0 0 0 0 0 0.5 0 0 0 0 0 0 0 0 0 0.5 0 0 0.37 0 0
## // Expected heterozygosity: 0 0 0.42 0.38 0 0 0 0 0 0 0 0 0 0 0 0.23 0 0 0 0.09 0.04 0 0.01 0 0 0 0 0 0.01 0 0 0 0 0.25 0 0 0 0 0 0 0 0 0.23 0 0 0.37 0 0 0 0 0 0 0 0 0 0 0 0.49 0 0 0 0 0 0 0 0 0 0.49 0 0 0.38 0 0
names(het2)
## [1] "n"         "n.by.pop"  "loc.n.all" "pop.n.all" "NA.perc"   "Hobs"     
## [7] "Hexp"
plot(het2$Hobs, xlab="Loci number", ylab="Observed Heterozygosity", 
     main="Observed heterozygosity per locus")

plot(het2$Hobs, het2$Hexp, xlab="Hobs", ylab="Hexp", 
     main="Expected heterozygosity as a function of observed heterozygosity per locus")

#Not sure if that's significant or not...

#LD
library(pegas)

JPTloci <- genind2loci(pegas.JPT)

LD2(JPTloci, locus = c(1, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN
#Strange... I think this means there is absolutely 100 percent no LD...

LD2(JPTloci, locus = c(10, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN
LD2(JPTloci, locus = c(30, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN
#Yeah, there's no LD here. 

Now I will do FIN:

#hardy-weinberg test
pegas.FIN <- vcfR2genind(FIN, sep = "[|/]")
hw.test(pegas.FIN, B = 0)
##                  chi^2 df Pr(chi^2 >)
## rs547870    0.00000000  0   1.0000000
## rs138655388 0.00000000  0   1.0000000
## rs79057843  0.98408008  1   0.3211936
## rs11219407  0.02779810  1   0.8675844
## rs539322343 0.00000000  0   1.0000000
## rs557336616 0.00000000  0   1.0000000
## rs182148733 0.00000000  0   1.0000000
## rs536792742 0.00000000  0   1.0000000
## rs201867883 0.00000000  0   1.0000000
## rs574962070 0.00000000  0   1.0000000
## rs542394318 0.00000000  0   1.0000000
## rs142498161 0.00000000  0   1.0000000
## rs572702527 0.00000000  0   1.0000000
## rs145971999 0.00000000  0   1.0000000
## rs138504511 0.00000000  0   1.0000000
## rs12422129  0.13297333  1   0.7153689
## rs543960796 0.00000000  0   1.0000000
## rs146142166 0.00000000  0   1.0000000
## rs529332605 0.00000000  0   1.0000000
## rs200481059 0.09667969  1   0.7558511
## rs140019860 0.00000000  0   1.0000000
## rs533269341 0.00000000  0   1.0000000
## rs551254667 0.00000000  0   1.0000000
## rs113201955 0.00000000  0   1.0000000
## rs536630067 0.00000000  0   1.0000000
## rs199564594 0.00000000  0   1.0000000
## rs148563076 0.00000000  0   1.0000000
## rs397832341 0.00000000  0   1.0000000
## rs554354023 0.00000000  0   1.0000000
## rs572538067 0.00000000  0   1.0000000
## rs147538417 0.00000000  0   1.0000000
## rs558465945 0.00000000  0   1.0000000
## rs144654389 0.00000000  0   1.0000000
## rs503223    0.28010412  1   0.5966330
## rs561904511 0.00000000  0   1.0000000
## rs574079030 0.00000000  0   1.0000000
## rs541128230 0.00000000  0   1.0000000
## rs186225765 0.00000000  0   1.0000000
## rs546568617 0.00000000  0   1.0000000
## rs3017763   0.00000000  0   1.0000000
## rs3017764   0.00000000  0   1.0000000
## rs150347384 0.00000000  0   1.0000000
## rs1893764   0.13297333  1   0.7153689
## rs548590301 0.00000000  0   1.0000000
## rs567111591 0.00000000  0   1.0000000
## rs4084209   0.02779810  1   0.8675844
## rs201667584 0.00000000  0   1.0000000
## rs144455396 0.00000000  0   1.0000000
## rs201229870 0.00000000  0   1.0000000
## rs148375252 0.00000000  0   1.0000000
## rs141555350 0.00000000  0   1.0000000
## rs11219408  0.00000000  0   1.0000000
## rs191928020 0.00000000  0   1.0000000
## rs116021926 0.00000000  0   1.0000000
## rs144615790 0.00000000  0   1.0000000
## rs559573395 0.00000000  0   1.0000000
## rs577451354 0.00000000  0   1.0000000
## rs4936880   1.23164201  1   0.2670879
## rs563115832 0.00000000  0   1.0000000
## rs200511194 0.00000000  0   1.0000000
## rs548855939 0.00000000  0   1.0000000
## rs144812642 0.00000000  0   1.0000000
## rs146687807 0.00000000  0   1.0000000
## rs112431396 0.00000000  0   1.0000000
## rs201358968 0.00000000  0   1.0000000
## rs150326783 0.00000000  0   1.0000000
## rs370575449 0.00000000  0   1.0000000
## rs4936881   1.23164201  1   0.2670879
## rs61908597  0.38810360  1   0.5332979
## rs574009321 0.00000000  0   1.0000000
## rs4936882   0.02779810  1   0.8675844
## rs148404016 0.00000000  0   1.0000000
## rs202048997 0.00000000  0   1.0000000
#No SNPs out of HWE here

#Tajima's D test

FINdna <- vcfR2DNAbin(FIN)
## After extracting indels, 72 variants remain.
## 
Variant 3 processed
tajima.test(FINdna)
## $D
## [1] 1.535645
## 
## $Pval.normal
## [1] 0.1246255
## 
## $Pval.beta
## [1] 0.1453725
#D-val is greater than 0, but P-vals are not significant, so this is not interpreted as selection

#Heterozygosity
het3 <- summary(pegas.FIN)
het3
## 
## // Number of individuals: 99
## // Group sizes: 99
## // Number of alleles per locus: 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1
## // Number of alleles per group: 84
## // Percentage of missing data: 0 %
## // Observed heterozygosity: 0 0 0.41 0.38 0 0 0 0 0 0 0 0 0 0 0 0.07 0 0 0 0.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0.1 0 0 0 0 0 0 0 0 0.07 0 0 0.38 0 0 0 0 0 0 0 0 0 0 0 0.42 0 0 0 0 0 0 0 0 0 0.42 0.24 0 0.38 0 0
## // Expected heterozygosity: 0 0 0.46 0.38 0 0 0 0 0 0 0 0 0 0 0 0.07 0 0 0 0.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0.1 0 0 0 0 0 0 0 0 0.07 0 0 0.38 0 0 0 0 0 0 0 0 0 0 0 0.48 0 0 0 0 0 0 0 0 0 0.48 0.23 0 0.38 0 0
names(het3)
## [1] "n"         "n.by.pop"  "loc.n.all" "pop.n.all" "NA.perc"   "Hobs"     
## [7] "Hexp"
plot(het3$Hobs, xlab="Loci number", ylab="Observed Heterozygosity", 
     main="Observed heterozygosity per locus")

plot(het3$Hobs, het3$Hexp, xlab="Hobs", ylab="Hexp", 
     main="Expected heterozygosity as a function of observed heterozygosity per locus")

#I think I need to compare all of the graphs for this to work? 

#LD

FINloci <- genind2loci(pegas.FIN)

LD2(FINloci, locus = c(1, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN
#Same as above... no LD. Will check other loci just to be sure.

LD2(FINloci, locus = c(10, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN
LD2(FINloci, locus = c(30, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN

Now I will do LWK

#hardy-weinberg test
pegas.LWK <- vcfR2genind(LWK, sep = "[|/]")
hw.test(pegas.LWK, B = 0)
##                   chi^2 df Pr(chi^2 >)
## rs547870    0.000000000  0   1.0000000
## rs138655388 0.000000000  0   1.0000000
## rs79057843  1.154084971  1   0.2826957
## rs11219407  0.501088614  1   0.4790222
## rs539322343 0.000000000  0   1.0000000
## rs557336616 0.000000000  0   1.0000000
## rs182148733 0.002550955  1   0.9597184
## rs536792742 0.000000000  0   1.0000000
## rs201867883 0.000000000  0   1.0000000
## rs574962070 0.000000000  0   1.0000000
## rs542394318 0.002550955  1   0.9597184
## rs142498161 0.000000000  0   1.0000000
## rs572702527 0.000000000  0   1.0000000
## rs145971999 0.000000000  0   1.0000000
## rs138504511 0.010308205  1   0.9191303
## rs12422129  0.045459184  1   0.8311619
## rs543960796 0.000000000  0   1.0000000
## rs146142166 0.002550955  1   0.9597184
## rs529332605 0.000000000  0   1.0000000
## rs200481059 0.010308205  1   0.9191303
## rs140019860 0.000000000  0   1.0000000
## rs533269341 0.000000000  0   1.0000000
## rs551254667 0.000000000  0   1.0000000
## rs113201955 0.132973329  1   0.7153689
## rs536630067 0.000000000  0   1.0000000
## rs199564594 0.000000000  0   1.0000000
## rs148563076 0.000000000  0   1.0000000
## rs397832341 0.000000000  0   1.0000000
## rs554354023 0.000000000  0   1.0000000
## rs572538067 0.000000000  0   1.0000000
## rs147538417 0.000000000  0   1.0000000
## rs558465945 0.000000000  0   1.0000000
## rs144654389 0.000000000  0   1.0000000
## rs503223    1.516920088  1   0.2180862
## rs561904511 0.000000000  0   1.0000000
## rs574079030 0.000000000  0   1.0000000
## rs541128230 0.000000000  0   1.0000000
## rs186225765 0.000000000  0   1.0000000
## rs546568617 0.132973329  1   0.7153689
## rs3017763   0.002550955  1   0.9597184
## rs3017764   0.002550955  1   0.9597184
## rs150347384 0.000000000  0   1.0000000
## rs1893764   0.422683137  1   0.5156012
## rs548590301 0.000000000  0   1.0000000
## rs567111591 0.000000000  0   1.0000000
## rs4084209   0.501088614  1   0.4790222
## rs201667584 0.000000000  0   1.0000000
## rs144455396 0.000000000  0   1.0000000
## rs201229870 0.000000000  0   1.0000000
## rs148375252 0.000000000  0   1.0000000
## rs141555350 0.000000000  0   1.0000000
## rs11219408  0.002550955  1   0.9597184
## rs191928020 0.000000000  0   1.0000000
## rs116021926 0.280104120  1   0.5966330
## rs144615790 0.010308205  1   0.9191303
## rs559573395 0.000000000  0   1.0000000
## rs577451354 0.000000000  0   1.0000000
## rs4936880   1.344606818  1   0.2462232
## rs563115832 0.000000000  0   1.0000000
## rs200511194 0.000000000  0   1.0000000
## rs548855939 0.000000000  0   1.0000000
## rs144812642 0.000000000  0   1.0000000
## rs146687807 0.000000000  0   1.0000000
## rs112431396 0.000000000  0   1.0000000
## rs201358968 0.000000000  0   1.0000000
## rs150326783 0.000000000  0   1.0000000
## rs370575449 0.000000000  0   1.0000000
## rs4936881   1.344606818  1   0.2462232
## rs61908597  0.096679688  1   0.7558511
## rs574009321 0.000000000  0   1.0000000
## rs4936882   1.734212803  1   0.1878738
## rs148404016 0.990000000  1   0.3197424
## rs202048997 0.000000000  0   1.0000000
#No SNPs out of HWE here

#Tajima's D test

LWKdna <- vcfR2DNAbin(LWK)
## After extracting indels, 72 variants remain.
## 
Variant 3 processed
tajima.test(LWKdna)
## $D
## [1] 0.3055142
## 
## $Pval.normal
## [1] 0.7599745
## 
## $Pval.beta
## [1] 0.7285134
#D-val is greater than 0, but P-vals are very not significant, so this is not interpreted as selection

#Heterozygosity
het4 <- summary(pegas.LWK)
het4
## 
## // Number of individuals: 99
## // Group sizes: 99
## // Number of alleles per locus: 1 1 2 2 1 1 2 1 1 1 2 1 1 1 2 2 1 2 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 2 2 1 2 1 1 2 1 1 1 1 1 2 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 2 2 1
## // Number of alleles per group: 96
## // Percentage of missing data: 0 %
## // Observed heterozygosity: 0 0 0.29 0.54 0 0 0.01 0 0 0 0.01 0 0 0 0.02 0.26 0 0.01 0 0.02 0 0 0 0.07 0 0 0 0 0 0 0 0 0 0.42 0 0 0 0 0.07 0.01 0.01 0 0.19 0 0 0.54 0 0 0 0 0 0.01 0 0.1 0.02 0 0 0.55 0 0 0 0 0 0 0 0 0 0.55 0.06 0 0.57 0.18 0
## // Expected heterozygosity: 0 0 0.33 0.5 0 0 0.01 0 0 0 0.01 0 0 0 0.02 0.26 0 0.01 0 0.02 0 0 0 0.07 0 0 0 0 0 0 0 0 0 0.38 0 0 0 0 0.07 0.01 0.01 0 0.21 0 0 0.5 0 0 0 0 0 0.01 0 0.1 0.02 0 0 0.49 0 0 0 0 0 0 0 0 0 0.49 0.06 0 0.5 0.17 0
names(het4)
## [1] "n"         "n.by.pop"  "loc.n.all" "pop.n.all" "NA.perc"   "Hobs"     
## [7] "Hexp"
plot(het4$Hobs, xlab="Loci number", ylab="Observed Heterozygosity", 
     main="Observed heterozygosity per locus")

plot(het4$Hobs, het4$Hexp, xlab="Hobs", ylab="Hexp", 
     main="Expected heterozygosity as a function of observed heterozygosity per locus")

#This one appears to have less heterozygosity than the rest... is that significant?

#LD

LWKloci <- genind2loci(pegas.LWK)

LD2(LWKloci, locus = c(1, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN
#Same as above. Nothing.

LD2(LWKloci, locus = c(10, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN
LD2(LWKloci, locus = c(30, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN

Now I will do PUR

#hardy-weinberg test
pegas.PUR <- vcfR2genind(PUR, sep = "[|/]")
hw.test(pegas.PUR, B = 0)
##                   chi^2 df Pr(chi^2 >)
## rs547870    0.000000000  0  1.00000000
## rs138655388 0.000000000  0  1.00000000
## rs79057843  1.958935205  1  0.16162724
## rs11219407  1.424206174  1  0.23271233
## rs539322343 0.002427128  1  0.96070740
## rs557336616 0.000000000  0  1.00000000
## rs182148733 0.000000000  0  1.00000000
## rs536792742 0.000000000  0  1.00000000
## rs201867883 0.000000000  0  1.00000000
## rs574962070 0.000000000  0  1.00000000
## rs542394318 0.000000000  0  1.00000000
## rs142498161 0.000000000  0  1.00000000
## rs572702527 0.000000000  0  1.00000000
## rs145971999 0.000000000  0  1.00000000
## rs138504511 0.000000000  0  1.00000000
## rs12422129  0.462222222  1  0.49658724
## rs543960796 0.000000000  0  1.00000000
## rs146142166 0.000000000  0  1.00000000
## rs529332605 0.000000000  0  1.00000000
## rs200481059 0.063093014  1  0.80167245
## rs140019860 0.000000000  0  1.00000000
## rs533269341 0.000000000  0  1.00000000
## rs551254667 0.000000000  0  1.00000000
## rs113201955 0.000000000  0  1.00000000
## rs536630067 0.000000000  0  1.00000000
## rs199564594 0.022272457  1  0.88136458
## rs148563076 0.000000000  0  1.00000000
## rs397832341 0.000000000  0  1.00000000
## rs554354023 0.000000000  0  1.00000000
## rs572538067 0.000000000  0  1.00000000
## rs147538417 0.000000000  0  1.00000000
## rs558465945 0.000000000  0  1.00000000
## rs144654389 0.000000000  0  1.00000000
## rs503223    0.159102207  1  0.68998444
## rs561904511 0.000000000  0  1.00000000
## rs574079030 0.000000000  0  1.00000000
## rs541128230 0.000000000  0  1.00000000
## rs186225765 0.000000000  0  1.00000000
## rs546568617 0.000000000  0  1.00000000
## rs3017763   0.000000000  0  1.00000000
## rs3017764   0.000000000  0  1.00000000
## rs150347384 0.000000000  0  1.00000000
## rs1893764   0.389837568  1  0.53238480
## rs548590301 0.000000000  0  1.00000000
## rs567111591 0.000000000  0  1.00000000
## rs4084209   0.615384615  1  0.43276758
## rs201667584 0.000000000  0  1.00000000
## rs144455396 0.000000000  0  1.00000000
## rs201229870 0.000000000  0  1.00000000
## rs148375252 0.000000000  0  1.00000000
## rs141555350 0.000000000  0  1.00000000
## rs11219408  0.000000000  0  1.00000000
## rs191928020 0.000000000  0  1.00000000
## rs116021926 0.022272457  1  0.88136458
## rs144615790 0.000000000  0  1.00000000
## rs559573395 0.000000000  0  1.00000000
## rs577451354 0.000000000  0  1.00000000
## rs4936880   3.190143318  1  0.07408352
## rs563115832 0.000000000  0  1.00000000
## rs200511194 0.000000000  0  1.00000000
## rs548855939 0.000000000  0  1.00000000
## rs144812642 0.000000000  0  1.00000000
## rs146687807 0.000000000  0  1.00000000
## rs112431396 0.000000000  0  1.00000000
## rs201358968 0.000000000  0  1.00000000
## rs150326783 0.000000000  0  1.00000000
## rs370575449 0.000000000  0  1.00000000
## rs4936881   4.077146006  1  0.04346702
## rs61908597  1.390901939  1  0.23825284
## rs574009321 0.000000000  0  1.00000000
## rs4936882   1.164640026  1  0.28050531
## rs148404016 0.022272457  1  0.88136458
## rs202048997 0.000000000  0  1.00000000
#rs4936881 is not in HWE, which, interestingly, is not one that is not in HWE globally. 

#Tajima's D test

PURdna <- vcfR2DNAbin(PUR)
## After extracting indels, 72 variants remain.
## 
Variant 3 processed
tajima.test(PURdna)
## $D
## [1] 0.6941448
## 
## $Pval.normal
## [1] 0.4875914
## 
## $Pval.beta
## [1] 0.4812697
#D-val is greater than 0, but P-vals are not significant, so this is not interpreted as selection

#Heterozygosity
het5 <- summary(pegas.PUR)
het5
## 
## // Number of individuals: 104
## // Group sizes: 104
## // Number of alleles per locus: 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 2 2 1
## // Number of alleles per group: 88
## // Percentage of missing data: 0 %
## // Observed heterozygosity: 0 0 0.57 0.41 0.01 0 0 0 0 0 0 0 0 0 0 0.12 0 0 0 0.05 0 0 0 0 0 0.03 0 0 0 0 0 0 0 0.14 0 0 0 0 0 0 0 0 0.12 0 0 0.4 0 0 0 0 0 0 0 0.03 0 0 0 0.58 0 0 0 0 0 0 0 0 0 0.59 0.1 0 0.4 0.03 0
## // Expected heterozygosity: 0 0 0.5 0.37 0.01 0 0 0 0 0 0 0 0 0 0 0.12 0 0 0 0.05 0 0 0 0 0 0.03 0 0 0 0 0 0 0 0.15 0 0 0 0 0 0 0 0 0.11 0 0 0.38 0 0 0 0 0 0 0 0.03 0 0 0 0.49 0 0 0 0 0 0 0 0 0 0.49 0.11 0 0.37 0.03 0
names(het5)
## [1] "n"         "n.by.pop"  "loc.n.all" "pop.n.all" "NA.perc"   "Hobs"     
## [7] "Hexp"
plot(het5$Hobs, xlab="Loci number", ylab="Observed Heterozygosity", 
     main="Observed heterozygosity per locus")

plot(het5$Hobs, het5$Hexp, xlab="Hobs", ylab="Hexp", 
     main="Expected heterozygosity as a function of observed heterozygosity per locus")

#possibly less heterozygosity than normal?

#LD

PURloci <- genind2loci(pegas.PUR)

LD2(PURloci, locus = c(1, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN
#Yep. None.

LD2(PURloci, locus = c(10, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN
LD2(PURloci, locus = c(30, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN

Finally, I will do PJL.

#hardy-weinberg test
pegas.PJL <- vcfR2genind(PJL, sep = "[|/]")
hw.test(pegas.PJL, B = 0)
##                   chi^2 df Pr(chi^2 >)
## rs547870    0.000000000  0  1.00000000
## rs138655388 0.010637119  1  0.91785471
## rs79057843  0.828758138  1  0.36263187
## rs11219407  2.853745541  1  0.09116133
## rs539322343 0.000000000  0  1.00000000
## rs557336616 0.000000000  0  1.00000000
## rs182148733 0.000000000  0  1.00000000
## rs536792742 0.000000000  0  1.00000000
## rs201867883 0.000000000  0  1.00000000
## rs574962070 0.000000000  0  1.00000000
## rs542394318 0.000000000  0  1.00000000
## rs142498161 0.000000000  0  1.00000000
## rs572702527 0.000000000  0  1.00000000
## rs145971999 0.000000000  0  1.00000000
## rs138504511 0.000000000  0  1.00000000
## rs12422129  0.010637119  1  0.91785471
## rs543960796 0.000000000  0  1.00000000
## rs146142166 0.000000000  0  1.00000000
## rs529332605 0.000000000  0  1.00000000
## rs200481059 0.181474480  1  0.67010915
## rs140019860 0.000000000  0  1.00000000
## rs533269341 0.000000000  0  1.00000000
## rs551254667 0.000000000  0  1.00000000
## rs113201955 0.000000000  0  1.00000000
## rs536630067 0.000000000  0  1.00000000
## rs199564594 0.000000000  0  1.00000000
## rs148563076 0.000000000  0  1.00000000
## rs397832341 0.000000000  0  1.00000000
## rs554354023 0.000000000  0  1.00000000
## rs572538067 0.000000000  0  1.00000000
## rs147538417 0.000000000  0  1.00000000
## rs558465945 0.000000000  0  1.00000000
## rs144654389 0.000000000  0  1.00000000
## rs503223    1.185185185  1  0.27630292
## rs561904511 0.002631507  1  0.95908789
## rs574079030 0.000000000  0  1.00000000
## rs541128230 0.000000000  0  1.00000000
## rs186225765 0.000000000  0  1.00000000
## rs546568617 0.000000000  0  1.00000000
## rs3017763   0.068632217  1  0.79333878
## rs3017764   0.068632217  1  0.79333878
## rs150347384 0.000000000  0  1.00000000
## rs1893764   0.010637119  1  0.91785471
## rs548590301 0.000000000  0  1.00000000
## rs567111591 0.000000000  0  1.00000000
## rs4084209   2.853745541  1  0.09116133
## rs201667584 0.000000000  0  1.00000000
## rs144455396 0.000000000  0  1.00000000
## rs201229870 0.000000000  0  1.00000000
## rs148375252 0.000000000  0  1.00000000
## rs141555350 0.000000000  0  1.00000000
## rs11219408  0.000000000  0  1.00000000
## rs191928020 0.000000000  0  1.00000000
## rs116021926 0.000000000  0  1.00000000
## rs144615790 0.000000000  0  1.00000000
## rs559573395 0.000000000  0  1.00000000
## rs577451354 0.000000000  0  1.00000000
## rs4936880   1.330129131  1  0.24878225
## rs563115832 0.000000000  0  1.00000000
## rs200511194 0.000000000  0  1.00000000
## rs548855939 0.000000000  0  1.00000000
## rs144812642 0.000000000  0  1.00000000
## rs146687807 0.000000000  0  1.00000000
## rs112431396 0.000000000  0  1.00000000
## rs201358968 0.000000000  0  1.00000000
## rs150326783 0.000000000  0  1.00000000
## rs370575449 0.000000000  0  1.00000000
## rs4936881   1.330129131  1  0.24878225
## rs61908597  0.344299531  1  0.55735790
## rs574009321 0.000000000  0  1.00000000
## rs4936882   2.853745541  1  0.09116133
## rs148404016 0.000000000  0  1.00000000
## rs202048997 0.000000000  0  1.00000000
#technically all are in HWE, but there are a few SNPs that are close to not being in HWE.  

#Tajima's D test

PJLdna <- vcfR2DNAbin(PJL)
## After extracting indels, 72 variants remain.
## 
Variant 3 processed
tajima.test(PJLdna)
## $D
## [1] 0.7594609
## 
## $Pval.normal
## [1] 0.4475769
## 
## $Pval.beta
## [1] 0.4457295
#D-val is greater than 0, but P-vals are not significant, so this is not interpreted as selection

#Heterozygosity
het6 <- summary(pegas.PJL)
het6
## 
## // Number of individuals: 96
## // Group sizes: 96
## // Number of alleles per locus: 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1
## // Number of alleles per group: 88
## // Percentage of missing data: 0 %
## // Observed heterozygosity: 0 0.02 0.41 0.4 0 0 0 0 0 0 0 0 0 0 0 0.02 0 0 0 0.08 0 0 0 0 0 0 0 0 0 0 0 0 0 0.1 0.01 0 0 0 0 0.05 0.05 0 0.02 0 0 0.4 0 0 0 0 0 0 0 0 0 0 0 0.41 0 0 0 0 0 0 0 0 0 0.41 0.14 0 0.4 0 0
## // Expected heterozygosity: 0 0.02 0.45 0.48 0 0 0 0 0 0 0 0 0 0 0 0.02 0 0 0 0.08 0 0 0 0 0 0 0 0 0 0 0 0 0 0.12 0.01 0 0 0 0 0.05 0.05 0 0.02 0 0 0.48 0 0 0 0 0 0 0 0 0 0 0 0.46 0 0 0 0 0 0 0 0 0 0.46 0.14 0 0.48 0 0
names(het6)
## [1] "n"         "n.by.pop"  "loc.n.all" "pop.n.all" "NA.perc"   "Hobs"     
## [7] "Hexp"
plot(het6$Hobs, xlab="Loci number", ylab="Observed Heterozygosity", 
     main="Observed heterozygosity per locus")

plot(het6$Hobs, het6$Hexp, xlab="Hobs", ylab="Hexp", 
     main="Expected heterozygosity as a function of observed heterozygosity per locus")

#So it does seem that there are differences in heterozygosity in each of these populations, but what does that mean?

#LD

PJLloci <- genind2loci(pegas.PJL)

LD2(PJLloci, locus = c(1, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN
#:)

LD2(PJLloci, locus = c(10, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN
LD2(PJLloci, locus = c(30, 73), details = TRUE)
## $Delta
##   0
## 0 0
## 
## $T2
##    T2    df P-val 
##   NaN     0   NaN

Before I move on to the discussion, I want to make a graphic with all of the heterozygosity plots together, just because it will make heterozygosity easier to discuss.

par(mfrow=c(3,2))
plot(het1$Hobs, xlab="Loci number", ylab="Observed Heterozygosity", main="Heterozygosity in All Populations")
plot(het2$Hobs, xlab="Loci number", ylab="Observed Heterozygosity", main="Heterozygosity in JPT")
plot(het3$Hobs, xlab="Loci number", ylab="Observed Heterozygosity", main="Heterozygosity in FIN")
plot(het4$Hobs, xlab="Loci number", ylab="Observed Heterozygosity", main="Heterozygosity in LWK")
plot(het5$Hobs, xlab="Loci number", ylab="Observed Heterozygosity", main="Heterozygosity in PUR")
plot(het6$Hobs, xlab="Loci number", ylab="Observed Heterozygosity", main="Heterozygosity in PJL")

Discussion

This is a bulleted summary of my conclusions about this data:


To conclude, I would like to say that this was an interesting preliminary study. I suspected there would not be much in the way of selection signatures in OR10G4, since humans do not rely heavily on smell. It would have certainly been interesting if there was some selection, based on different odorants in the environment that people need to hone in on, but as of right now that looks like it is not the case. It is promising, however, that there seems to be SNPs globally that are not in HWE (and are being acted on by selection), as well as some almost-significant Tajima’s D statistics. Additionally, it’s worth noting that there may have been some R error on my part, since this was my first time doing any sort of SNP analysis in R. Hopefully, if there were errors (for example, the lack of a reference sequence in my DNAbin object may have screwed things up), I can iron them out over the summer. Nevertheless I think this was a good preliminary experiment in genomics in R. I expect to explore more populations, and possibly more genes (actually, I am interested in looking at the gene OR7D4, which is known to cause differences in odor perception of androstenone) that have SNPs that influence odor perception in the near future.

References


  1. Mainland, Joel et al. “The Missense of Smell: Functional Variability in the Human Odorant Receptor Repertoire.” Nature Neuroscience 17 (December 8, 2013): 114-20. Accessed May 6, 2018. doi:10.1038/nn.3598.

  2. Mainland, Joel et al. “The Missense of Smell: Functional Variability in the Human Odorant Receptor Repertoire.” Nature Neuroscience 17 (December 8, 2013): 114-20. Accessed May 6, 2018. doi:10.1038/nn.3598.

  3. Keller, Andreas et al. “Genetic Variation in a Human Odorant Receptor Alters Odour Perception.” Nature 449 (September 16, 2007): 468-72. Accessed May 6, 2018. doi:10.1038/nature06162.

  4. Mainland, Joel et al. “The Missense of Smell: Functional Variability in the Human Odorant Receptor Repertoire.” Nature Neuroscience 17 (December 8, 2013): 114-20. Accessed May 6, 2018. doi:10.1038/nn.3598.

  5. Mainland, Joel et al. “The Missense of Smell: Functional Variability in the Human Odorant Receptor Repertoire.” Nature Neuroscience 17 (December 8, 2013): 114-20. Accessed May 6, 2018. doi:10.1038/nn.3598.

  6. EMBL-EBI. “Data Slicer.” Ensembl. Last modified March 2014. Accessed May 6, 2018. http://grch37.ensembl.org/Homo_sapiens/Tools/DataSlicer.